E cient and exact computation of inclusion dependencies for data integration

نویسندگان

  • Jana Bauckmann
  • Ulf Leser
  • Felix Naumann
چکیده

Data obtained from foreign data sources often come with only super cial structural information, such as relation names and attribute names. Other types of metadata that are important for e ective integration and meaningful querying of such data sets are missing. In particular, relationships among attributes, such as foreign keys, are crucial metadata for understanding the structure of an unknown database. The discovery of such relationships is di cult, because in principle for each pair of attributes in the database each pair of data values must be compared. A precondition for a foreign key is an inclusion dependency (IND) between the key and the foreign key attributes. We present with Spider an algorithm that e ciently nds all INDs in a given relational database. It leverages the sorting facilities of DBMS but performs the actual comparisons outside of the database to save computation. Spider analyzes very large databases up to an order of magnitude faster than previous approaches. We also evaluate in detail the e ectiveness of several heuristics to reduce the number of necessary comparisons. Furthermore, we generalize Spider to nd composite INDs covering multiple attributes, and partial INDs, which are true INDs for all but a certain number of values. This last type is particularly relevant when integrating dirty data as is often the case in the life sciences domain our driving motivation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An E cient Method for Computing Exact Path Delay Fault Coverage

We describe algorithms and data structures for accurate and e cient computation of path delay fault coverage. Our method uses an interval-based representation of consecutively numbered path delay faults. We describe a modi ed 2-3 tree data structure to store and manipulate these intervals to keep track of tested faults. Some results obtained using non-robust simulation of benchmark circuits sug...

متن کامل

cient and Accurate B - rep Generation of Low Degree Sculptured Solids Using Exact Arithmetic : II - Computation

We present e cient algorithms for exact boundary computation on low degree sculptured CSG solids using exact arithmetic. These include algorithms for computing the intersection curves of low-degree trimmed parametric surfaces, decomposing them into multiple components for e cient point location queries inside the trimmed regions, and computing the boundary of the resulting solid using topologic...

متن کامل

E cient and Accurate B-rep Generation of Low Degree Sculptured Solids using Exact Arithmetic

We present e cient representations and algorithms for exact boundary computation on low degree sculptured CSG solids using exact arithmetic. Most of the previous work using exact arithmetic has been restricted to polyhedral models. In this paper, we generalize it to higher order objects, whose boundaries are composed of rational parametric surfaces. The use of exact arithmetic and representatio...

متن کامل

Eecient and Accurate B-rep Generation of Low Degree Sculptured Solids Using Exact Arithmetic:ii - Computation

We present e cient algorithms for exact boundary computation on low degree sculptured CSG solids using exact arithmetic. These include algorithms for computing the intersection curves of low-degree trimmed parametric surfaces, decomposing them into multiple components for e cient point location queries inside the trimmed regions, and computing the boundary of the resulting solid using topologic...

متن کامل

E cient B-rep Generation of Low Degree Sculptured Solids using Exact Arithmetic

We present e cient representations and algorithms for exact boundary computation on low degree sculptured CSG solids using exact arithmetic. Most of the previous work using exact arithmetic has been restricted to polyhedral models. In this paper, we generalize it to higher order objects, whose boundaries are composed of rational parametric surfaces. The use of exact arithmetic and representatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009